Active Learning of Input Grammars

نویسندگان

  • Matthias Höschele
  • Alexander Kampmann
  • Andreas Zeller
چکیده

Knowing the precise format of a program’s input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data flow through program execution into lexical and syntactic entities; (2) assign these entities names that are based on the associated variable and function identifiers; and (3) systematically generalize production rules by means of membership queries. As a result, we need only a minimal set of sample inputs to obtain human-readable context-free grammars that reflect valid input structure. In our evaluation on inputs like URLs, spreadsheets, or configuration files, our AUTOGRAM prototype obtains input grammars that are both accurate and very readable—and that can be directly fed into test generators for comprehensive automated testing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

D.Béchet A.Foret

This paper investigates the learnability by positive examples in the sense of Gold of Pregroup Grammars. In a first part, Pregroup Grammars are presented and a new parsing strategy is proposed. Then, theoretical learnability and non-learnability results for subclasses of Pregroup Grammars are proved. In the last two parts, we focus on learning Pregroup Grammars from a special kind of input call...

متن کامل

Learning Node Replacement Graph Grammars in Metabolic Pathways

This paper describes graph-based relational, unsupervised learning algorithm to infer node replacement graph grammar and its application to metabolic pathways. We search for frequent subgraphs and then check for overlap among the instances of the subgraphs in the input graph. If subgraphs overlap by one node, we propose a node replacement graph grammar production. We also can infer a hierarchy ...

متن کامل

Learning Multiple Languages in Groups

We consider a variant of Gold’s learning paradigm where a learner receives as input n different languages (in form of one text where all input languages are interleaved). Our goal is to explore the situation when a more “coarse” classification of input languages is possible, whereas more refined classification is not. More specifically, we answer the following question: under which conditions, ...

متن کامل

Rapidly Deploying Grammar-Based Speech Applications with Active Learning and Back-off Grammars

Grammar-based approaches to spoken language understanding are utilized to a great extent in industry, particularly when developers are confronted with data sparsity. In order to ensure wide grammar coverage, developers typically modify their grammars in an iterative process of deploying the application, collecting and transcribing user utterances, and adjusting the grammar. In this paper, we ex...

متن کامل

Partial Learning Using Link Grammars Data

Kanazawa has shown that several non-trivial classes of categorial grammars are learnable in Gold’s model. We propose in this article to adapt this kind of symbolic learning to natural languages. In order to compensate the combinatorial explosion of the learning algorithm, we suppose that a small part of the grammar to be learned is given as input. That is why we need some initial data to test t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.08731  شماره 

صفحات  -

تاریخ انتشار 2017